-
Notifications
You must be signed in to change notification settings - Fork 13.3k
PoC cache configuration control #7060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Expaned boards.txt.py to allow new MMU options and create revised .ld's Updated eboot to pass 48K IRAM segments. Added Cache_Read_Enable intercept to modify call for 16K ICACHE Update platform.txt to pass new mmu options through to compiler and linker preprocessor. Added quick example: esp8266/MMU48K
Added MMU_ qualifier to new defines. Moved changes into their own file. Don't know how to fix platformio issue.
Updated tools/sizes.py to report correct IRAM size and indicate ICACHE size. Merged in earlephilhower's work on unaligned exception. Refactored and added support for store operations and changed the name to be more closely aligned with its function. Improved crash reporting path.
Very cool, and probably of much more general use than my virtual memory setup (although being able to malloc an 8MB block was kind of neat...). I heard the impact to performance was minimal in your tests which is even more amazing. One thing I see you didn't pull in is the changes I did to make the memory accessible. When using the non32b exception handler, UMM can manage the memory as just another heap, allowing users to malloc() and new() with impunity. I think that would make it much more usable for general folks. The interface could be the stack one I tried (worked well for me and let me hack libs to silently use internal/fast or external/slow), or @devyte was thinking more of a single flag (normal/iram/external) for simplicity. Any thought from you on malloc interfaces? Also, PlatformIO is not liking the .ld->.h conversions you've done. I can't quite grok what's wrong and have never been able to get a working general PIO install running on my system, but you might want to take a look. |
@earlephilhower Thanks for the kind words.
Yes, that was in a different PR that I meant to go back and look at. I was hoping that I could adapt UMM to do that when I added the non32b; however, I think I see an issue in the ROMs I also had trouble with the Hello Bear example when I fixed it up with an iRAM stack directly. My memory has faded on the specifics, I don't remember the kind of crash. I suppose it could be interrupt related. --- Just looked through your PR --- Not sure why I had trouble and you didn't with the Bear stuff. I'll have to try that again. I am also thinking that UMM should revert back to internal DRAM, anytime it is called with interrupts disabled. That would result in all the iRAM allocations occurring in the foreground and maybe for exception processing disable IRQs at the start and leave it to the exit logic to restore PS.
Not sure. While simplicity is usually best, this is not looking very simple. DRAM, IRAM, FLASH, and external SRAM each have different performance properties and concerns. For now, I'll look at merging in your umm_malloc changes.
Yea I didn't know what to do about that. I have an idea now that might work better w/o all the renaming to |
Added some inline functions to aid in byte and short access to iRAM. * only byte read has been tested Updated .ld file to work better with platform.io; however, I am still missing some steps, so platformio will still fail.
I was unable to get the SSL stack in external SRAM working, and I doubt that you'll be able to make it work in IRAM, either. There is no problem other than it's too slow and accessed on non-word boundaries, and accessed too intensely. I get a WDT w/external SRAM for the BSSL stack, and I would bet that's what you're seeing too. I actually moved the 17KB SSL buffer (i.e. the per-connection info) to SRAM and that ran fine and any perf. difference was undetectable. I also moved the String() allocator to it, too, and also had no issue in anything I tested. It's probably a matter of adding more optimistic_yield()s in the library, but I didn't look into it. |
master was missing new additions added by boards.txt.py in the PR. Which the CI flags when it rebuilds boards.txt.
Adapted changes to umm_malloc, Esp.cpp, StackThunk.cpp, WiFiClientSecureBearSSL.cpp, and virtualmem.ino to irammem.ino from @earlephilhower PR esp8266#6994. Reworked umm_malloc to use context pointers instead of copy context. umm_malloc now supports allocations from IRAM. Added class HeapSelectIram, ... to aid in selecting alternate heaps, modeled after class InterruptLock. Restrict alloc request from ISRs to DRAM. Never ending improvements to debug printing. Sec Heap option now pulls in free IRAM left over in the 1st 32K block. Managed through umm_malloc with HeapSelectIram. Updated examples.
Don't know what to do with platformio it doesn't like my .S file. ifdef out USE_ISR_SAFE_EXC_WRAPPER to block the new assemlby module from building on platformio only.
Re eb9882e. I don't see any errors showing up in the CI... if that's the reason for confusion about PIO issues, here some locally rolled back to 91fc391. Paths are cleaned up for readability:
|
@mcspr Thankyou! That was my problem. |
resolved conflicts in boards.txt.py and platform.txt
Resolved boards.txt conflict.
Limited access to some detailed typdefs/prototypes to .cpp modules, to avoid future build conflicts. Completed TODO for verifing that the "C" structure struct __exception_frame matches the ASM version. Fixed some typo's, code rot, and added some more cases in examaple irammem.ino. Refactored a little and reordered printing to ease comparison between methods. Corrected `#ifdef __cplusplus` coverage area. Cleaned up `extern "C" ...` usage. Fixes issues with including mmu_iram.h or esp8266_undocumented.h in .c files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default menu option changes nothing so it should be safe to merge.
Changes are numerous but quite clear if time is spent to read them.
In my humble opinion, this feature will give a second life to esp8266/arduino where memory is sometimes tight.
I tried it with esp8266Audio which requires a fair amount of cpu power for decoding, and all went well with the shared second heap allowing a bigger audio buffer and much more free DRAM.
Thank you for this huge work @mhightower83 !
Some comment tuning. In the context of _xtos_set_exception_handler and the functions it registers, changed to type int for exception cause type. This is also the type used by gdbstub and some other Xtensa files I found.
uint32_t v;
} mmu_cre_status_t;
extern mmu_cre_status_t mmu_status;
That has been removed. |
Congratulations @mhightower83, this is an awesome feature! |
@devyte Thanks! It looks like that little POC for 16K/32K cache selection grew a little bit. I want to thank @earlephilhower for his work on a virtual memory PR which I leverage to get started with the exception handler and 1st Heap selection API. |
* add double-quotes to `compiler.S.flags` * fix windows-specific processes (`recipe.hooks.linking.prelink.[12].pattern.windows`) * rewrite processing of "mkdir" and "cp" in python because of platform-independence
* Fix: cannot build after #7060 on Win64 * add double-quotes to `compiler.S.flags` * fix windows-specific processes (`recipe.hooks.linking.prelink.[12].pattern.windows`) * rewrite processing of "mkdir" and "cp" in python because of platform-independence * make consistent with the use of quotation marks in other *.py files
…lash * upstream/master: (72 commits) Typo error in ESP8266WiFiGeneric.h (esp8266#7797) lwip2: use pvPortXalloc/vPortFree and "-free -fipa-pta" (esp8266#7793) Use smarter cache key, cache Arduino IDE (esp8266#7791) Update to SdFat 2.0.2, speed SD access (esp8266#7779) BREAKING - Upgrade to upstream newlib 4.0.0 release (esp8266#7708) mock: +hexdump() from debug.cpp (esp8266#7789) more lwIP physical interfaces (esp8266#6680) Rationalize File timestamp callback (esp8266#7785) Update to LittleFS v2.3 (esp8266#7787) WiFiServerSecure: Cache SSL sessions (esp8266#7774) platform.txt: instruct GCC to perform more aggressive optimization (esp8266#7770) LEAmDNS fixes (esp8266#7786) Move uzlib to master branch (esp8266#7782) Update to latest uzlib upstream (esp8266#7776) EspSoftwareSerial bug fix release 6.10.1: preciseDelay() could delay() for extremely long time, if period duration was exceeded on entry. (esp8266#7771) Fixed OOM double count in umm_realloc. (esp8266#7768) Added missing check for failure on umm_push_heap calls in Esp.cpp (esp8266#7767) Fix: cannot build after esp8266#7060 on Win64 (esp8266#7754) Add the missing 'rename' method wrapper in SD library. (esp8266#7766) i2s: adds i2s_rxtxdrive_begin(enableRx, enableTx, driveRxClocks, driveTxClocks) (esp8266#7748) ...
Thank you for this amazing option. It really comes in handy specially when using SSL, helping the ESP8266 instead of replacing with ESP32. As a sidenote for newcomers when you use the tool, the extra HEAP won't show with the usual freeHeap function, but its there working and having extra heap. |
@efitrillo That is good to hear. The well-established Heap APIs, like freeHeap, work with the current Heap selected. #include <umm_malloc/umm_heap_select.h>
#ifdef UMM_HEAP_IRAM
{
// Note, the current heap does not change if the IRAM Heap was not in the
// build option. In that case, ESP.getFreeHeap() will report free DRAM space.
HeapSelectIram ephemeral;
Serial.printf("IRAM free: %6d\r\n", ESP.getFreeHeap());
}
#else
Serial.printf("IRAM free: 0\r\n");
#endif |
MMU - Adjust the Ratio of ICACHE to IRAM
The Arduino IDE Tools menu has a new option,
MMU
.Possible selections are:
32KB cache + 32KB IRAM (balanced)
16KB cache + 48KB IRAM (IRAM)
16KB cache + 48KB IRAM and 2nd Heap (shared)
umm_malloc
library to work.malloc
APIs.HeapSelect
class.16KB cache + 32KB IRAM + 16KB 2nd Heap (not shared)
umm_malloc
libraryNew build defines and possible values.These are the results of the menu options described above:
#define
shared
not shared
MMU_IRAM_SIZE
0x8000
0xC000
0xC000
0x8000
MMU_ICACHE_SIZE
0x8000
0x4000
0x4000
0x4000
MMU_IRAM_HEAP
umm_malloc
MMU_SEC_HEAP
> _text_end
**> _text_end
**0x40108000
MMU_SEC_HEAP_SIZE
0x4000
** These defines are to inline functions that calculate the values, based on unused code space.
IRAM, unlike DRAM, must be accessed as aligned full 32-bit words, no byte or short access.
I assume pgm_read macros would work; however, the store operation would remain an issue. ets_memcpy - appears to work well as long as byte count is rounded up to be evenly divided by 4.
Non-32-Bit Access
Pulled in work from earlephilhower's PR #6978, updated/refactored to handle writes to iRAM and more. This allows word and byte access to iRAM through a load/store exception handler. This would best be used, for infrequently accessed data. Expect it to be very slow, each character access will require a complete save and restore of all 16+ registers.
The Arduino IDE Tools menu has a new option,
Non-32-Bit Access
.Selections are:
Use pgm_read macros for IRAM/PROGMEM
Byte/Word access to IRAM/PROGMEM (very slow)
To get a sense of how memory access time is effected, see examples
MMU48K
andirammem
inESP8266
.Miscellaneous
For calls to
umm_malloc
with interrupts disabled.malloc
will always allocate fromDRAM
when called with interrupts disabled.realloc
will fail if not built withUSE_ISR_SAFE_EXC_WRAPPER
defined.USE_ISR_SAFE_EXC_WRAPPER
defined inmmu_iram.h
.realloc
requests that requiremalloc
to complete, will allocate fromDRAM
.ISR/Exception Handler Issue
The non-32-bit exception handler is called by a "C" wrapper function in ROM. This ROM function enables interrupts before calling our registered handler. Defining
USE_ISR_SAFE_EXC_WRAPPER
inmmu_iram.h
will install a replacement that does not enable interrupts (now default). The effects on Network performance are unknown.To keep ISR execution time with interrupts disabled at a minimum, avoid the use of IRAM from ISRs. Especially the use of non-32-bit read/writes on IRAM.
How to Select Heap
The
MMU
selection16KB cache + 48KB IRAM and 2nd Heap (shared)
allows you to use the standard heap API function calls (malloc
,calloc
,free
, ... ). to allocate memory from DRAM or IRAM. The selection can be made by instantiating the classHeapSelectIram
orHeapSelectDram
.The usage is similar to that of theInterruptLock
class. The default/initial heap source is DRAM. The class is inumm_malloc/umm_malloc.h
Low level functions for selecting a heap. These are used by the above Classes:
umm_get_current_heap_id()
umm_set_heap_by_id( ID value )
UMM_HEAP_DRAM
UMM_HEAP_IRAM
(code present in umm_malloc only, not enabled)UMM_HEAP_EXTERNAL
Also, APIs added from earlephilhower's PR #6978 are:
ESP.setIramHeap()
Pushes current heap onto a stack and sets IRAM heap.ESP.setDramHeap()
Pushes current heap onto a stack and sets DRAM heap.ESP.resetHeap()
Restores previously pushed heap.Updated to reflect current features in the PR.